Búsqueda | Portal Regional de la BVS

Strangeness-driven exploration in multi-agent reinforcement learning.

Kim, Ju-Bong; Choi, Ho-Bin; Han, Youn-Hee.

Neural Netw ; 172: 106149, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38306786

RESUMEN

In this study, a novel exploration method for centralized training and decentralized execution (CTDE)-based multi-agent reinforcement learning (MARL) is introduced. The method uses the concept of strangeness, which is determined by evaluating (1) the level of the unfamiliarity of the observations an agent encounters and (2) the level of the unfamiliarity of the entire state the agents visit. An exploration bonus, which is derived from the concept of strangeness, is combined with the extrinsic reward obtained from the environment to form a mixed reward, which is then used for training CTDE-based MARL algorithms. Additionally, a separate action-value function is also proposed to prevent the high exploration bonus from overwhelming the sensitivity to extrinsic rewards during MARL training. This separate function is used to design the behavioral policy for generating transitions. The proposed method is not much affected by stochastic transitions commonly observed in MARL tasks and improves the stability of CTDE-based MARL algorithms when used with an exploration method. By providing didactic examples and demonstrating the substantial performance improvement of our proposed exploration method in CTDE-based MARL algorithms, we illustrate the advantages of our approach. These evaluations highlight how our method outperforms state-of-the-art MARL baselines on challenging tasks within the StarCraft II micromanagement benchmark, underscoring its effectiveness in improving MARL.

Asunto(s)

Aprendizaje , Refuerzo en Psicología , Recompensa , Algoritmos , Benchmarking

Sortation Control Using Multi-Agent Deep Reinforcement Learning in N-Grid Sortation System.

Kim, Ju-Bong; Choi, Ho-Bin; Hwang, Gyu-Young; Kim, Kwihoon; Hong, Yong-Geun; Han, Youn-Hee.

Sensors (Basel) ; 20(12)2020 Jun 16.

Artículo en Inglés | MEDLINE | ID: mdl-32560217

RESUMEN

Intralogistics is a technology that optimizes, integrates, automates, and manages the logistics flow of goods within a logistics transportation and sortation center. As the demand for parcel transportation increases, many sortation systems have been developed. In general, the goal of sortation systems is to route (or sort) parcels correctly and quickly. We design an n-grid sortation system that can be flexibly deployed and used at intralogistics warehouse and develop a collaborative multi-agent reinforcement learning (RL) algorithm to control the behavior of emitters or sorters in the system. We present two types of RL agents, emission agents and routing agents, and they are trained to achieve the given sortation goals together. For the verification of the proposed system and algorithm, we implement them in a full-fledged cyber-physical system simulator and describe the RL agents' learning performance. From the learning results, we present that the well-trained collaborative RL agents can optimize their performance effectively. In particular, the routing agents finally learn to route the parcels through their optimal paths, while the emission agents finally learn to balance the inflow and outflow of parcels.

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA